High Performance Stencil Code Algorithms for GPGPUs

نویسندگان

  • Andreas Schäfer
  • Dietmar Fey
چکیده

In this paper we investigate how stencil computations can be implemented on state-of-the-art general purpose graphics processing units (GPGPUs). Stencil codes can be found at the core of many numerical solvers and physical simulation codes and are therefore of particular interest to scientific computing research. GPGPUs have gained a lot of attention recently because of their superior floating point performance and memory bandwidth. Nevertheless, especially memory bound stencil codes have proven to be challenging for GPGPUs, yielding lower than to be expected speedups. We chose the Jacobi method as a standard benchmark to evaluate a set of algorithms on NVIDIA’s latest Fermi chipset. One of our fastest algorithms is a parallel wavefront update. It exploits the enlarged on-chip shared memory to perform two time step updates per sweep. To the best of our knowledge, it represents the first successful application of temporal blocking for 3D stencils on GPGPUs and thereby exceeds previous results by a considerable margin. It is also the first paper to study stencil codes on Fermi.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Wave Equation Based Stencil Optimizations on Multi-core CPU

As the engine for seismic imaging algorithms, stencil kernels modeling wave propagation are both computeand memoryintensive. This work targets improving the performance of wave equation based stencil code parallelized by OpenMP on a multi-core CPU. To achieve this goal, we explored two techniques: improving vectorization by using hardware SIMD technology, and reducing memory traffic to mitigate...

متن کامل

Parallel visual data restoration on multi-GPGPUs using stencil-reduce pattern

In this paper, a highly effective parallel filter for visual data restoration is presented. The filter is designed following a skeletal approach, using a newly proposed stencil-reduce, and has been implemented by way of the FastFlow parallel programming library. As a result of its high-level design, it is possible to run the filter seamlessly on a multicore machine, on multi-GPGPUs, or on both....

متن کامل

Towards a performance-portable description of geometric multigrid algorithms using a domain-specific language

High Performance Computing (HPC) systems are nowadays more and more heterogeneous. Different processor types can be found on a single node including accelerators such as Graphics Processing Units (GPUs). To cope with the challenge of programming such complex systems, this work presents a domain-specific approach to automatically generate code tailored to different processor types. Low-level CUD...

متن کامل

ExaStencils: Advanced Stencil-Code Engineering

Project ExaStencils pursues a radically new approach to stencil-code engineering. Present-day stencil codes are implemented in general-purpose programming languages, such as Fortran, C, or Java, or derivates thereof, and harnesses for parallelism, such as OpenMP, OpenCL or MPI. ExaStencils favors a much more domain-specific approach with languages at several layers of abstraction, the most abst...

متن کامل

Advanced Stencil-Code Engineering

This report documents the program and the outcomes of Dagstuhl Seminar 15161 “Advanced Stencil-Code Engineering”. The seminar was hosted by the DFG project with the same name (ExaStencils for short) in the DFG priority programme “Software for Exascale Computing” (SPPEXA). It brought together experts from mathematics, computer science and applications to explore the challenges of very high perfo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011